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Abstract 

It is well known that the performance of sparse vector recovery algorithms from compressive measure- 
ments can depend on the distribution underlying the non-zero elements of a sparse vector However, 
the extent of these effects has yet to be explored, and formally presented. In this paper, I empirically 
investigate this dependence for seven distributions and fifteen recovery algorithms. The two morals of 
this work are: 1) any judgement of the recovery performance of one algorithm over that of another must 
be prefaced by the conditions for which this is observed to be true, including sparse vector distributions, 
and the criterion for exact recovery; and 2) a recovery algorithm must be selected carefully based on 
what distribution one expects to underlie the sensed sparse signal. 

I. Introduction 

Several researchers have observed that the distribution of the non-zero elements of a sparse vector can 
impact its recoverability from its observation, e.g., |[l|-|[5|. The degrees to which recovery algorithms 
are affected by the underlying distribution of a sparse vector have not been thoroughly investigated, both 
empirically and analytically, except perhaps in the case of £i-regularization |[3| and OMP ||6|. The worst 
case scenario appears to be sparse vectors distributed Bernoulli equiprobable in {—a, a} (or "constant 
amplitude random sign") Q, Q. Maleki and Donoho [4] use these types of vectors to tune three recovery 
algorithms. They also briefly investigate the change in recovery performance by their tuned algorithms 
for sparse vectors distributed Cauchy, uniform in [—1, 1], and Laplacian Q; but this is just to show that 
vectors distributed differently than Bernoulli have better recoverability. Their reasoning goes, if one's 
sparse vector has a different distribution, the possibility of recovery from compressive measurements will 
be better than expected. 
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In this article, I empirically compare the performance of fifteen algorithms for the recovery of com- 
pressively sampled sparse vectors having elements sampled from seven distributions. The algorithms I 
test can be grouped into four categories: greedy iterative descent, thresholding, convex relaxation, and 
majorization. The greedy approaches I test are: orthogonal matching pursuit (OMP) |[l|, |[7|, stagewise 
OMP 1 8], regularized OMP |9|, and probabilistic OMP (PrOMP) For the thresholding algorithms, I 
test the recommended algorithms produced by Maleki and Donoho Q — which includes iterative hard 



and soft thresholding (IHT, 1ST) 1 11 1, and two-stage thresholding (TST) — Approximate Message Passing 
(AMP) g, Subspace Pursuit (SP) ||2|, and CoSaMP |jT2j, and Algebraic Pursuit with 1-memory (ALPS) 
|13|. The convex relaxation methods I test inclde -minimization (BP) |14|, iteratively reweighted ii- 



minimization (IRll) |15|, and Gradient Projection for Sparse Reconstruction (GPSR) |[T6]. Finally, the 
majorization approach is the smoothed Iq technique (SLO) |17]. The seven distributions from which I 
sample sparse vectors are: Normal; Laplacian; uniform; Bernoulli; bimodal Gaussian; bimodal uniform; 
and bimodal Rayleigh. I test several pairs of problem sparsities and indeterminacies for sensing matrices 
sampled from the uniform spherical ensemble. 

These comparisons provide many interesting observations, many of which are known but yet to be 
formally stated. First, I find that there is no one algorithm that outperforms the others for all distributions 
I test when using exact recovery of the full support. SLO and BP/ AMP however, do appear to be the 
best in all cases. Some algorithms are extremely sensitive to the distribution underlying sparse vectors, 
and others are not. Greedy methods and majorization perform better than £i -minimization approaches for 
vectors distributed with probability density concentrated at zero. Recovery of Bernoulli distributed vectors 
shows the lowest rates. Thus, choosing a recovery algorithm should be guided by the expected underlying 
distribution of the sparse vector. I also find that one can obtain an inflated recovery performance with 
a seemingly reasonable criterion for exact recovery. For some distributions, such a criterion may not 
produce results that accurately reflect the true performance of an algorithm. 

In the next section, I briefly review compressed sensing and my notation. Then I describe each of the 
fifteen algorithms that I test. Section 3 provides the main menu of results from my numerous simulations. 
Here I look at the effects on performance of perfect recovery criterion, and the effects of sparse vector 
distributions. I also compare all algorithms for each of the distributions I test through inspecting their 
phase transitions. I conclude with a summary, and avenues for future research. 
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II. Compressed Sensing and Signal Recovery 

Given a vector of ambient dimension A^, x G C^, we sense it by $ : — )• C", producing 
a measurement vector u = $x. Several methods have been developed and studied for recovering x 
given u and which are obviously sensitive to both the size and content of $ relative to x. The 
overdetermined problem (A^ < m) has been studied in statistics, linear algebra, frame theory, and others. 



e.g., 1 18 1, 1 19 1. The underdetermined problem (A^ > m), with x a sparse vector, and $ a random matrix. 



has been studied in approximation theory p9| , |20|, and more recently compressed sensing |21|-|23|. 
From here on, we are working with vectors that have ambient dimensions larger than the number of 
measurements, i.e., N > m. Below, I briefly review the fifteen algorithms I test, and provide details 
about their implementations. 



A. Notations 

We define the support of a vector x G as the indices of those elements with non-zero values, i.e., 

5(x) := {n G : [x]„ / 0} (1) 

where [a]„ is the nth component of the vector, and Q := {1,2,..., N}. A vector x is called s-sparse 
when at most s of its entries are nonzero, i.e., |5(x)| < s. As the work of this paper is purely empirical, 
I only consider finite-energy complex vectors in a Hilbert space of dimension A^ < oo. Here, the inner 
product of two vectors is defined (a, b) := b*a where b* is the conjugate transpose; and the £p-norm 
for 1 < p < oo of any vector in this space is defined 

N 

M\l:=Y.\W. (2) 

n=l 

In compressed sensing, the sensing matrix $ = [¥'i|</'2l • • • \'Pn] maps a length- A^ vector to a lower 
m-dimensional space. I define ^q,^ as a matrix of the k columns of $ indexed by the ordered set il^ C Jl, 
i.e., $f7^ = ['PQ,^{i)\'-PQ.k{2) \ ■ ■ ■ where $7^(1) is the first element of il^. I notate a set difference 

as Alternatively, one may speak of a dictionary of atoms, V = {ip^ G C^jng^. For the problem 

of recovering the sensed vector from its measurements, one defines its indeterminacy as 5 := m/N, 
and its sparsity as p := s/m. The problem indeterminacy describes the undersampling of a vector in 
its measurements; and the problem sparsity describes the proportion of the sensing matrix active in the 
measurements. 
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B. Recovery by Greedy Pursuit 

Greedy pursuits entail the iterative augmentation of a set of atoms selected from the dictionary, and 
an updating of the residual. I initialize all the following greedy methods by Qq = 0, and tq = u, and 
define the stopping criteria to be ||r/^H2 < 10^^||u||2, or > 2s, unless otherwise specified. 

1) Orthogonal Matching Pursuit (OMP) /7j/.- OMP augments the kth index set Qk+i = J^fc U {n^} by 
selecting a new index according to 

Uk = argmin||rfc - {vk, (fJfnWl = argmax |(rfc, (3) 

where the kth residual is defined as the projection of the measurements onto the subspace orthogonal to 
that spanned by the dictionary elements indexed by Qk, i-C, 

rfc:=P^u=(I^-*f,,*t,Ju (4) 

where ^l^^ := and Im is the size-m identity matrix. OMP thereby ensures each 

residual is orthogonal to the space spanned by the atoms indexed by Qk- OMP creates the A;th solution 
by 

Xfc = argmin ||u - *If7^.x||2 (5) 

X 

where the N square matrix = 1 Vj G $7^, and zero elsewhere. The implementation I us^involves 

a QR decomposition to efficiently perform the projection step. 

2) Probabilistic OMP (PrOMP) f^: PrOMP augments the A;th index set Qk+i = ^ {nk} by 
sampling from ~ P{n G J7*|rjt), where the set Q,* denotes the true indices of the non-zero entries of 
X, and the residual is defined in Q. Divekar et al. pO| estimates this distribution by 



P{n e n*\rk) = < 



{l-p)/l, \{rk,'Pn)\>fl 

p/(^N-k-l), 0<\{rk,<p^)\<fi (6) 
0, Kr,,<^„)|=0 
where < p < 1, and fi is the Ith largest value in {|(rfc, : n G Q}, 1 < I < N — k. In this way, 
PrOMP produces several candidate solutions (|5]), which can include that found by OMP, and selects the 
one that produces the smallest residual ||rfc||2. In my implementation, I set p = 0.001, I = 2, and have 
PrOMP generate at most 10 solutions. I determined these values by experimentation, but not formalized 
tuning. I make PrOMP stop generating each solution using the same stopping criteria of OMP. 



SolveOMP .m in SparseLab: ^ http://sparselab.stanford.ed u/ 
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3) Regularized OMP (ROMP) f^: ROMP augments the kth index set ^k+i = il^U J* where J* C ^ 
is a set of indices determined in the following way. First, ROMP defines the set 

I ■.= {iGn:\{rk,<^i)\>0} (7) 

where is defined in Q, and because of this we know / n 17^ = 0. Next, ROMP finds the set of L 
disjoint sets J = { Ji, J2, ■ ■ ■ , Jl ■ \Ji\ ^ s} where 

Jl = {i G / : \{rk,fj)\ > Jmax|(rfc,</5j|} (8) 

J2 = {j G I\Ji : \{rk,<Pj)\ > I max \{rk,^i)\} (9) 

2 ie/\Ji 



Jl = {j G A Uti Jl ■■ \{rk,^j)\ > I max \{rk,^,)\}. (10) 

From this set, ROMP chooses the best defined by 

J* = argmax||$}rfc||2. (11) 

The solution at a given iteration is defined in Q. Note that ROMP requires one to specify the solution 
sparsity desired. 

4) Stagewise OMP (StOMP) Ij^: StOMP augments the fcth index set ilk+i = ilfc U J* where 

J* ■={iGn:\{rk,^,)\>tk(Jk} (12) 

where tj^Ufc is a threshold parameter, and is defined in Q. The solution at a given iteration is defined 
in In their work |8J, Donoho et al. define ak = \\rk\\2/ y/rn, and 2 < < 3, motivated from either 
avoiding false alarms (which requires knowing the sparsity of the solution) or missed detection (false 
discovery rate). I retain the defaults of the implementation I use|^ which means the threshold is set by 
the false discovery rate with tk = 2. Donoho et al. recommend running this procedure only 5-10 times, 
but here I ran it up to 2s times which I find does not degrade its performance. 

C. Recovery by Thresholding 

Thresholding techniques entail the iterative refinement of a solution, and an update of the residual. I 
initialize all the following thresholding methods by xq = 0, and restrict the number of refinements to 
k < 300, or when llrfclU < 10~^||u||2. 



^SolveStOMP .m in SparseLab: 



http ://sparselab. stanford.edu/ 
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1 ) Iterative Thresholding /[77]/; Iterative thresholding refines the solution according to 

Xfc+i = r(xfc + K**rfc;rfc) (13) 

where T(y) is a thresholding function applied element-wise to y, > is a threshold, < k < 1 is 
a relaxation parameter, and the residual is = u — $Xfc. For iterative hard thresholding (IHT), this 
function is defined 

X, \x\ > Tk 



T{x;Tk) := <^ 



(14) 
0, else. 



For iterative soft thresholding (1ST), this function is defined 

{Sgn(x)(|x| - Tk), \x\ > Tk 
(15) 
0, else. 

The implementations of IHT and 1ST I use are the "recommended versions" by Maleki and Donoho Q, 
where they set k = 0.65 for IHT and 0.6 for 1ST, and adaptively set Tk according to a false alarm rate, 
the problem indeterminacy, and an estimate of the variance of the residualj^ These settings come from 
extensive empirical tests for sparse signals distributed Bernoulli. 

2) Compressive Sampling MP (CoSaMP) p2^: CoSaMP refines the solution x^ by two stages of 
thresholding. First, given a sparsity s CoSaMP finds the support 

J:=S[T2s(,^*rk;e)] (16) 

where T2s(y) nulls all elements of y except for the 2s ones with the largest magnitudes above e > 0. 
CoSaMP then thresholds again to find the new support 



k+l 



r5(argmin||u- $l5(x)ujx||2; e) 



(17) 



where Ts{y) nulls all elements of y except for the s ones with the largest magnitudes above e > 0. 
The new solution Xk+i is then computed by ([s]). In my implementation of CoSaMP, I set e = 10^^^ to 
avoid numerical errors; and in addition to the stopping criterion mentioned above, I exit the refinements 
if ||rfc|| > ||rfc_i||, in which case I choose the previous solution. 

3) Subspace Pursuit (SP) /p]/.' SP operates in the same manner as CoSaMP, but instead of retaining 



the 2s largest magnitudes in ( 16l, it keeps only s. The stopping criteria of my implementation of SP are 
the same as for CoSaMP 



http://sparselab.stanford.edu/OptimalTumng/main.htm 
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4) Two-stage Thresholding (TST) Noting the similarity between the two, Maleki and Donoho 
generalize CoSaMP and SP into TST. Given x^., r^, and s, TST with parameters {a, (3) finds 

J:=5[r«s(xfc + K**rfc)] (18) 

where < k < 1 is a relaxation parameter, and Tasiy) nulls all elements of y except for the as ones 
with the largest magnitudes. TST then thresholds again to find the new support 



k+l 



7>s(argmin||u- *l5(x)ujx||2) 



(19) 



where T^s(y) nulls all elements of y except for the /3s ones with the largest magnitudes. The new solution 
Xfe+i is then computed by ([5]). Obviously, when a = 2 and /3 = 1, TST becomes similar to CoSaMP; 
and to become similar to SP, a = /3 = 1. The implementation of TST I use is that recommended by 
Maleki and Donoho where they choose a = /3 = 1, define k = 0.6, and estimate the sparsity 

s = [(0.044417(5^ + 0.34142(5 + 0.14844)mJ . (20) 

These values come from their extensive experiments. Note that one iteration of SP (or CoSaMP) and one 
iteration of TST are only equivalent when 

cS[T,(xfe + K**rfe)] =5(xfc) j5[T,(**rfc)]. (21) 

Since the thresholding is non-linear, TST will not produce the same results as SP (or as CoSaMP). 
5) Approximate Message Passing /i^: AMP proceeds as iterative thresholding, but with a critical 



difference in how it defines the residual in (13 1. Given x^, Xfc_i and as the mth largest magnitude in 



Xfc_i, AMP with soft thresholding defines the /c-order residual 



rfc = u - *Xfc + rfc_i — ||xfc_i + **rfc_i||o,Tfc (22) 
m 



where ||y||o,r := : |[y]n| > t}\- AMP then refines the solution x^ by soft thresholding (15l 



Xfc+i =r(xfc + **rfc;Tfc). (23) 

AMP repeats the procedure above until some stopping condition. In the implementation I usej^ I make 
AMP stop refinement when ||rfc||2 < 10~^||u||2 or A; > 300. 



See http://sparselab.stanford.edu/OptimalTumng/main.htm 
^http://people.epfl.ch/ulugbek.kamilov 
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6) Algebraic pursuit (ALPS) with 1-memory [13]: ALPS essentially entails accelerated iterative hard 
thresholding with memory. Given and a desired sparsity s, ALPS with 1-memory refines the solution 

Xfc+i = Ts{hk) + Hk[Ts{hk) - Ts(hk-i)] (24) 

where fxk £ (0, 1], and defining the expanded support set Ik '■= ^(xfc) U 5(rs[lQ\5(xj^)$*(u — *xjt)]) 

bfc :=Xfc + KfcIj,**(u-*Xfc) (25) 

where the optimal step size is given by 

_ ||Ix.^*(u-^Xfe)|b 

||*IX,**(U-$X,)||2- ^^^^ 

In the implementation I usej^ALPS computes the best weight at each step fi^ according to FISTA |j24|. 
Note that for ALPS I specify the correct sparsity, as I do for ROMP, CoSaMP and SP 

D. Recovery by Conxex Relaxation 

The following methods attempt to solve the sparse recovery problem 

min |5(x)| subject to u = $x (27) 

by replacing the non-convex measure of strict sparsity with a relaxed and convex measure. 

1) £i-minimization [J?]: The principle of Basis Pursuit (BP) replaces strict sense sparsity with 



the relaxed and convex £i norm. We can rewrite ( 27 1 in the following two ways 



min ||x||i subject to u = $x (28) 
min^llu- *x||2 + A||x||i (29) 



both of which can be solved by several methods, e.g., simplex and interior-point methods |25|, as a linear 



program |14| , and by gradient methods |16]. In my implementation, I solve p8| ) using the CVX toolbox 
11261. 



2) Gradient Projection for Sparse Reconstruction (GPSR) 116]: GPSR provides a computationally 



light and iterative approach to solving (29 1, or at least taking one to the neighborhood of the solution, by 
using gradient projection, thresholding, and line search. The details of the implementation are out of the 
scope of this paper, but suffice it to say I am using the "Basic" implementation provided by the authors 



161 with their defaults, and A = 0.0051 1 **u| 



oo • 



'^http://lions.epfl.ch/ALPS 
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3) Iteratively Reweighted ii-minimization (IRll) [151: Given a diagonal square matrix W^, IRll 
solves 

min ||Wfcx||i subject to u = $x. (30) 
Using the solution, IRll constructs a new weighting matrix 

[Wfc+i]„,„ = l/(|[xfc+i]„| +e) (31) 



where e > is set for stability. IRll then solves (30 1 with these new weights, and continues in this way 



until some stopping criterion is met. For initialization, Wq = I. In my implementation — which uses 



CVX as for BP — I make e = 0.1 as done in | [T5| , and limit the number of iterations to 4, or until 
llrfclb < 10-5||u||2. 
4) Smoothed io (SID) If we define 

m 

J(x;a):= J]exp[-[x]2/2a2]. (32) 

i=l 



then the strict sense sparsity in {21) can be expressed 

|5(x)| = m — lim J(x; a). (33) 

a—>-0 

For a decreasing set of variances {a, ad, ad"^, . . .} for < d < 1, SLO finds solutions to 

Xfc = argmin^llu- $x||2 - AJ(x;crfc) (34) 

using steepest descent with soft thresholding, and reprojection back to the feasible set. Finally, depending 
on the last ai, SLO arrives at a solution that will be quite close to the unique sparsest solution. In the 
implementation I use provided by the authors]^ cr := 2||$1^u||oo where here = I set 

d = 0.95, and restrict the last variance to be no larger than 4 x 10^^. 

III. Computer Simulations 

My problem suite is nearly the same as that used in the empirical work of Maleki and Donoho PI, 
except here I make N = 400 instead of 800, and test 50 realizations (instead of 100) of each sparse 
vector at each pair of problem sparsity and indetereminacyj^ I sample each real sensing matrix from the 

^http://ee. Sharif. ir/~SLzero 

^I am currently running simulations at the original dimensions used by Maleki and Donoho, but preliminary results do not 
show much difference with those presented below. 
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uniform spherical ensemble, where each column of $ is a point on the A^-dimensional unit sphere, and 
each element of each column of $ is independently and identically distributed zero-mean Gaussian. 

I generate each real s-sparse vector in the following way. First, I determine the indices for the non-zero 
elements by drawing s elements at random (without replacement) from Q. Then, for the vector elements 
at these indices, I independently sample from the same distribution of the following: 

1) Standard Normal (N) 

2) Zero-mean Laplacian (L) , with parameter A = 10 

3) Zero-mean uniform (U) in the support [—1,1] 

4) Bernoulli (B) equiprobable in { — 1,1} ("Constant Amplitude Random Sign" in Q) 

5) Bimodal Gaussian (BG) as a sum of two unit variance Gaussians with means ±3 

6) Bimodal uniform (BU) as a sum of two uniform distributions in [—4, —2] and [2, 4] 

7) Bimodal Rayleigh (BR), where the magnitude value is distributed Rayleigh with parameter a = 3. 
I sample from a distribution until I obtain a value with a magnitude greater than 10^^*^. This ensures that all 
sparse components have magnitudes 100 times greater than my specification of numerical precision (e < 
10^^^) in CoSaMP and SR Figure [T] shows the empirical distributions of all the signals I compressively 
sample. 

The problems I test have 30 different linearly-spaced sparsities p S [0.05, 1]. For each of these, I test 
16 different linearly-spaced indeterminacies 5 G [0.05, 0.5414] For each sparsity and indeterminacy pair 
then, I find the proportion of 50 vectors sensed by a $ that are recovered by the fifteen algorithms I 

'l am testing a wider range in my current simulations, but in my opinion, the most important part is when a signal is 
compressed to a high degree. 
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describe above. Before I test for recovery, I "debias" 1 16 1 each solution in the following way. I first find 
the effective support of a solution x 

Xx := S{Ti[xk]) ■■ rank{*5(T.rx,i)} = /,min||xfc - Ti{xk)\\2- (35) 

l<m 

In other words, I find a skinny submatrix of $ that has the largest (full column) rank of / associated 
with the / largest magnitudes in the solution. Finally, I produce the debiased solution by solving (|5]) with 
rjfc = Z^, and finally hard thresholding, i.e., 

i = T(argmin||u - *Ix.x||2; 10"^°) (36) 

X 

At each problem indeterminacy, I interpolate the recovery results over all sparsities to find where 
successful recovery occurs with a probability of 0.5. This creates an empirical phase transition plot 
showing the boundary above which most recoveries fail, and below which most recoveries succeed. 

A. Exact Recovery Criteria and Their Effects 

I consider two different criteria for exact recovery. First, I consider a solution recovered exactly when 

l|x — xllo „ 

" II II " < ex (RO 
for some ex > 0. We can relate (Re^) to the stopping criterion of the algorithms above, i.e., when 

llu — ^xllo 

" II II " < eu (37) 

l|u||2 

with eu > 0, and x is defined by ([5]). We can rewrite this as 

< (38) 

||*x||2 

and bound it considering the frame bounds of the sensing matrix, i.e., 

^||x||2 < ||*x||2 < S||x||2 Vx. (39) 

with < yl < < oo. Noticing that ||$(x — x)||2 > A||x — x||2, and ||$x||2 < i?||x||2, we can bound 
the expression from below by 

ANx-xlU ||*(x-x)||2 
'' II II " < " |V^ II ^" < eu (40) 

-B||x||2 ll*x||2 

and thus 

I |x — x| I2 B 

" " < -reu. (41) 



|x||2 A 
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From these we see that given (37 1 the recovery condition (Rg^) must be true when 

B 



A 



(42) 



10 



-2 



as done in Q, and set eu = 10 ^ (but Maleki and Donoho set 



In my experiments, I specify 
the latter = 10~^). 

The second recovery condition I use is the recovery of the support, in which case I consider a solution 
recovered exactly when 

5(x)=5(x). (Rs) 
When the criterion (Rg) is true, {Rg^ i is necessarily true by virtue of (jsjl with 17^ = 5(x). We might 



consider relating this recovery condition to that in ( R^^ I in the following sense. Given that only a portion 



of the true support is recovered, what are the conditions that criterion (Ri^) is true? Consider without 
loss of generality that an algorithm has recovered the true support except the first < \T\ < |5(x)| 
elements, i.e., 5(x) UT = 5(x), and 5(x) n T = 0. For simplicity I denote S = 5(x) and S = 5(x). 
If we assume that the atoms associated with the elements in T are orthogonal to the rest of the atoms 
in the support S, then we can write the solution as 



+ = X + $^u. 



Using the orthogonality assumption, we can write the left hand side of ( Rg^ ) as 

_ ||*i-*x||| _ ||*:^[*rP]x||i _ ||[*i-*r|0]x| 



X 



Ixrili 



(43) 



(44) 



as expected. And thus, we see one way to guarantee condition (Rg^) is for ||x7-||2 < ex||x||2. If we 
remove the orthogonality assumption, this becomes 



*5*tx||2 < ex||x||2. 



(45) 



Now, consider that IT] = |5| — 1, i.e., that we have missed all but one of the elements, but the recovery 
algorithm has found and precisely estimated the largest element with a magnitude a. Assume that all 
other non-zero elements have magnitudes less than or equal to /3. Thus, ||x — xjH < — l)/3^; and 



|x||| < — 1)/?^ + a^. From the above analysis, we see 



X — X 



< 



,,^,,2 „^,-l)/32+a2 

We can see that the criterion ( Rg^ i is still guaranteed as long as 

a' ^ (1-4)(|5|-1) _ 



(46) 



(47) 
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(a) Normal 



(b) Laplacian 
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Fig. 2. Differences between the empirical phase transitions for five recovery algorithms using criterion (7?^) and criterion 
(JJfj). Note different axis scaling in (d). 



This analysis shows that we can substantially violate the criterion ( Rs I and still meet ( R^^ i as long as 
the distribution of the sparse signal permits it. For instance, if all the elements of x have unit magnitudes, 
then missing \T\ elements of the support produces ||x;5 |||/||x||2 = |T|/|5|; and to satisfy (Ri^ 



this 



requires \/|T|7[^ ^ ^x- This is not likely unless \S\ is very large and we miss only a few elements. If 
instead our sparse signal is distributed such that it has only a few extremely large magnitudes and the 



rest small, then we can miss much more of the support and still satisfy {Ri^_ I. 

The following experiments test the variability of the phase transitions of several algorithms depending 
on these success criteria, and the distribution underlying the sparse signals. Figure [2] shows the differences 



in empirical phase transitions of five recovery algorithms using {Rs) or (-^£2 Since {Rs) implies {Rt^_ 



the empirical phase transition of the latter will always be equal to or greater than that of the former. 
The empirical phase transitions difference is zero across all problem indeterminacies when there is no 
difference between these two criteria. Figure [2] shows a significant dependence of the empirical phase 
transition and the success criteria for four sparse vector distributions. For all other algorithms, and the 
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three other distributions (Bernoulli, bimodal uniform, and bimodal Gaussian), the differences are nearly 
always zero. 

I do not completely know the reason why only these five algorithms out of the 151 test show significant 
differences in their empirical phase transitions, or why GPSR and StOMP appear the most volatile of these 
algorithms. It could be that they are better than the others at estimating the large non-zero components 
at the expense of modeling the small ones. However, this experiment clearly reveals that for sparse 



vectors distributed with probability density concentrated near zero, criterion ( R^^ I does allow many small 



components to pass detection without consequence. The more probability density is distributed around 



zero, the more criterion (R^^ i is likely to be satisfied while criterion (Rs) is violated. It is clear from 



this experiment that we must take caution when judging the performance of recovery algorithms by a 



success criterion that can be very lax. In the following experiments, I use criterion ( Rs I to measure and 



compare the success of the algorithms since it also implies {Ri 



B. Sparse Vector Distributions and their Ejfects 

Figures [3] and [4] show the variability of the empirical phase transitions for each algorithm depending 
on the sparse vector distributions. The empirical phase transitions of BP and AMP are exactly the same, 
and so I only show one in Figure |3ja). Clearly, the performance of BP, AMP, and recommended TST 
and 1ST, appear extremely robust to the distribution underlying sparse vectors. In Donoho et al., 
prove AMP to have this robustness. ROMP also shows a robustness, but across all indeterminacies its 
performance is relatively poor (note the difference in y-axis scaling). IRU and GPSR shows the same 
robustness to Bernoulli and the bimodal distributions; but they both show a surprisingly poor performance 
for Laplacian distributed vectors, even as the number of measurements increase. GPSR appears to have 
poorer empirical phase transitions as the probability density around zero increases. I do not yet know 
why IRU fails so poorly for Laplacian vectors. The specific results for recommended 1ST, IHT, and TST 
however, appear very different to those reported in |4|, though the main findings of their work comport 
with what I show here. 

The recovery performance of OMP, PrOMP, and SLO varies to the largest degree of all algorithms that 
I test. It is clear that these algorithms perform in proportion to the probability density around zero. In 
fact, for eight of the algorithms I test (IHT, ALPS, StOMR CoSaMP, SP OMR PrOMR and SLO) we can 
predict the order of performance for each distribution by the amount of probability density concentrated 
near zero. From Fig. [T] we can see that these are, in order from most to least concentrated: Laplacian, 
Normal, Uniform, Bimodal Rayleigh, Bimodal Gaussian, Bimodal uniform, and Bernoulli. This behavior 
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is reversed for only recommended 1ST, GPSR, and ROMP, where their performance increases the less 
concentrated probability density is around zero. 



January 18, 2013 



DRAFT 



16 



(a) BP, AMP 



(b) Recommended TST 



0.6 
0.5 

0.4 






































8G. 


L N 


0.3 
0.2 
0.1 


















R 


































































0.05 0.1 0.15 



0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 
Indeterminacy m/N Indeterminacy m/N 



(c) Recommended 1ST 



(d) IRU 



























































BR 


B 














BU 


BG 















































































BR 




















BG 


'b ^ 












U 


























L 























0.05 0.1 0.15 



0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 
Indeterminacy m/N Indeterminacy m/N 



(e) GPSR 



(f) ROMP 




0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 
Indeterminacy m/N 



0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 
Indeterminacy m/N 



(g) Recommended IHT 



(h) ALPS 




E 

io 



0.6 
0.5 






















0.4 




















L 


0.3 




















_ BR 


0.2 




















0.1 

























0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 
Indeterminacy m/N Indeterminacy m/N 



Fig. 3. Empirical pliase transitions using criterion \Rs\ of nine recovery algorithms for a variety of sparse vector distributions: 
Normal (N), Laplacian (L), Uniform (U), Bernoulli (B), Bimodal Gaussian (BG), Bimodal Uniform (BU), Bimodal Rayleigh 
(BR). Note different y-scaling for ROMP in (f). 
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Fig. 4. Empirical phase transitions using criterion \Rs\ of six recovery algoritlims for a variety of sparse vector distributions: 
Normal (N), Laplacian (L), Uniform (U), Bernoulli (B), Bimodal Gaussian (BG), Bimodal Uniform (BU), Bimodal Rayleigh 
(BR). Note different y-scale in (f). 
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C. Comparison of Recovery Algorithms for Each Distribution 

Figure [5] shows the same information as Figs. |3] and |4j but compares all fifteen algorithms together 
for single distributions. Here we can see that for sparse vectors distributed Bernoulli, bimodal uniform, 
and bimodal Gaussian, -minimization methods (BP, IRU, GPSR) and the thresholding approach AMP 
(which uses soft thresholding to approximate £i minimization), perform better than all the greedy methods 
and the other thresholding methods, and the majorization SLO. For the other four distributions, SLO, OMP 
and/or PrOMP outperform all the other algorithms I test, with a significantly higher phase transition in 
the case of vectors distributed Laplacian. In every case, AMP and BP perform the same. We can also 
see in all cases the phase transition for SP is higher than that for recommended TST, and sometimes 
much higher, even though for Maleki and Donoho's recommended algorithm they find that the (a, /?) 
pair that works best is that that makes it closest in appearance to SP. As I discuss about recommended 
TST above though, it is only similar to SP, and is not guaranteed to behave the same. Furthermore, and 
most importantly, in my simulations I provide SP (as well as CoSaMP, ROMP and ALPS) with the exact 
sparsity of the sensed signal, while recommended TST instead estimates it. Finally, in the case of sparse 
vectors distributed Laplacian, Normal, and Uniform, we can see that recommended IHT performs better 
than recommended TST, and sometimes even BP (Normal and Laplacian), which is different from how 
Maleki and Donoho orders their performance based on recovering sparse vectors distributed Bernoulli 

IS- 

Figure [6] shows the empirical phase transitions of the best performing algorithms for each distribution. 
BP and AMP perform the same for Bernoulli and bimodal uniform distributions. For the other five 
distributions, SLO performs the best at larger indeterminacies (6 > 0.2), and PrOMP performs better 
for these at smaller indeterminacies. From this graph we also see the large difference between the 
recoverability from compressive measurements of signal distributed Laplacian and Bernoulli. 
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Fig. 5. Comparison of phase transitions of all algorithms I test for each distribution. Note different y-scale in (g). 
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Fig. 6. Empirical phase transitions of the best peri^orming algorithms for each distribution. 
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D. Effects of Distribution on Probability of Exact Recovery 

Empirical phase transitions only show the regions in (5, p) -space where majority recovery does and 
does not hold. Another interesting aspect of these algorithms is how fast this transition occurs, and to what 
extents we can expect perfect recovery for all vectors. Figures [7] - [TT] compare for pairs of algorithms 
the probability of exact recovery as a function of sparsity for several problem indeterminacies for all 
distributions I test. 

In Fig. [7] we see that the transitions of BP and OMP are quite similar for all distributions except 
Normal and Laplacian, where OMP takes on a smaller transition slope than BP. At one extreme, for 
Bernoulli distributed signals, BP can perfectly recover signals with p < 0.35 (s < 76 for N = 400), 
while OMP can only recover all signals with p < 0.22 (s < 48 for N = 400). For Laplacian distributed 
signals, however, these are reversed, with BP perfectly recovering signals with p < 0.31 (s < 67 for 
N = 400), while OMP recovers all signals up to p < 0.44 {s < 95 for N = 400). 

Figure [8] compares the transition of probability for SLO and SP. They are quite similar for signals 
distributed Bernoulli, bimodal uniform, and bimodal Gaussian. For all other distributions I test they are 
much less similar; and perfect recovery for SLO for Laplacian distributed signals extends well beyond 
that of all the other algorithms. 

In Fig. [9]we see that the transitions of probability for CoSaMP and TST are quite similar for more dis- 
tributions except Bernoulli, and quite different from those of the other "two-stage thresholding" algorithm 
SP in Fig. [8] Both CoSaMP and TST have very steep transitions showing that the boundary between 
perfect recovery and majority recovery is quite narrow. A curious thing is that for most distributions I test, 
CoSaMP fails completely at p ^ 0.35. In the case of Laplacian signals, it is quite clear that for CoSaMP 
this sparsity cannot be exceeded no matter 6. I do not know at this time what causes this behavior. 



For the two hard thresholding approaches. Fig. 10 compares the transitions for IHT and ALPS. These 
more than any other, have quite irregular transitions for all distributions but Bernoulli and bimodal 
Uniform. We see for ALPS that only for these distributions can we expect perfect recovery for some 
sparsity. In all the others, ALPS never reaches 100% recovery, which is extremely problematic. The 
recommended IHT of Maleki and Donoho Q suffers no such problem, even though it must estimate the 
sparsity of the signal, while ALPS is given that information. 
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Fig. 7. Probability of exact recovery using criterion ^Rs\ for BP (solid) and PrOMP (dashed) as a function of problem sparsity 
and six problem indeterminacies from thickest to thinest lines: 5 — m/N = {0.05, 0.15, 0.25, 0.34, 0.44, 0.54}. 
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Fig. 9. Probability of exact recovery using criterion ^Rs\ for CoSaMP (solid) and TST (dashed) as a function of problem 
sparsity and six problem indeterminacies from thickest to thinest lines: S — m/N — {0.05, 0.15, 0.25, 0.34, 0.44, 0.54}. 
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Fig. 10. Probability of exact recovery using criterion \Rs) for CoSaMP (solid) and TST (dashed) as a function of problem 
sparsity and six problem indeterminacies from thickest to thinest lines: S — m/N — {0.05, 0.15, 0.25, 0.34, 0.44, 0.54}. 
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Figure 11 shows that StOMP has transition regions that are similar to those of ALPS, but over all 
distributions. We also see why in Fig. [5] StOMP has zero phase transition at low indeterminacies for 
vectors distributed bimodal Gaussian, bimodal Rayleigh, uniform, and Normal. I do not yet know the 
reason for these behaviors, but it could be due to the necessity to tune the false alarm rate of StOMP. 



Finally, Fig. 12 compares the transitions of IRU and GPSR. We see that the two are quite similar for 
vectors distributed Bernoulli, bimodal uniform, and bimodal Gaussian; but GPSR begins to break down as 
the probability density becomes more concentrated around zero. IRU has behavior that is inconsistent with 
what I would predict. For the most part, it acts fine for Normally distributed vectors, but its performance 
is very poor for vectors distributed uniform and especially Laplacian. At this time I do not know what 
troubles IRU for these distributions, or why it cannot perfectly recover Normal vectors with low sparsity, 
but it can recovery vectors with higher sparsity. 
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Fig. 11. Probability of exact recovery using criterion \Rs\ for StOMP (solid) and ROMP (dashed) as a function of problem 
sparsity and six problem indeterminacies from thickest to thinest lines: S — m/N — {0.05, 0.15, 0.25, 0.34, 0.44, 0.54}. 
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Fig. 12. Probability of exact recovery using criterion ^Rs\ for IRU (solid) and GPSR (dashed) as a function of problem 
sparsity and six problem indeterminacies from thickest to thinest lines: S — m/N — {0.05, 0.15, 0.25, 0.34, 0.44, 0.54}. 
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IV. Conclusion 

In this work I show that any judgement of which algorithm among several is the best for recovering 
compressively sensed sparse vectors must be prefaced by the conditions for which this is observed to be 
true. It is clear from my computer simulations that the performance of a recovery algorithm within the 
context of compressed sensing can be greatly affected by the distribution underlying the sensed sparse 
vector, and that the summary of performance is highly dependent on the criterion of successful recovery. 
These "findings" are certainly nothing novel, and clearly not controversial. It has already been stated in 
numerous pieces of research, and is somewhat codified as "folk knowledge" in the compressed sensing 
research community, that recovery algorithms are sensitive to the nature of a sparse vector, and that sparse 
vectors distributed Bernoulli appear to be the hardest to recover, e.g., |[T|-||6|. The extents to which this is 
true and meaningful, and the variability of performance to the criterion of successful recovery, however, 
had yet to be formally and empirically studied and presented. 

In light of this work, the important thing to ask moves from what recovery algorithm is the best, to 
what distribution underlies a compressively sensed vector, and what is the measure of success. In my 
experiments, we see that SLO performs extremely well in the sense of recovering the support (and thus the 
sparse vector exactly) for five distributions I test, except for low indeterminacies of Laplacian, Normal, 
bimodal Rayleigh, uniform, and bimodal Gaussian, where the greedy approach PrOMP performs slightly 
better. I do not yet know the reason for this, but it could be that the parameters for SLO must be adjusted. 
SLO does not perform as well as BP/AMP for sparse vectors distributed Bernoulli or bimodal uniform. 
With their performance, and because of their speed, SLO and AMP are together extremely attractive 
algorithms for recovering sparse vectors distributed in any of these seven ways. 

A critical question to answer is how these results change when the sensed vector is corrupted by 
noise, or when the sensing matrix is something other than from the uniform spherical ensemble. One 
potential problem with using the algorithms tuned by Maleki and Donoho Q is that they are tuned to 
sparse vectors distributed Bernoulli. It could be possible that they perform better for vectors distributed 
Laplacian if they are tuned to such a distribution; however, Maleki and Donoho argue that tuning on 
Bernoulli sparse vectors is essentially maximizing the best performance for the worst case, and that this 
translates to situations that are more forgiving. In my experiments, I do see that recommended IHT can 
perform better than recommended TST for sparse vectors distributed uniform. Normal, and Laplacian, 
which subverts their ordering of their algorithms in terms of performance. It stands to be reasoned then 
that we can better tune these algorithms for those situations such that recommended TST does outperform 
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recommended IHT. Finally, it is important to measure the performance of these algorithms for real-world 
signals that are sparsely described only in coherent dictionaries ||27|. 
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